Continuity estimates on the Tsallis relative entropy 



Alexey E. Rastegin 

Department of Theoretical Physics, Irkutsk State University, Gagarin Bv. 20, Irkutsk 664003, Russia 

Continuity properties of the Tsallis relative entropy are examined. The monotonicity of the 
quantum /-divergence leads to a consequence which is ready for estimating this measure from 
below. For order a £ (0; 1), a family of lower continuity bounds of Pinsker type is obtained. For 
a > 1 and the commutative case, upper continuity bounds on the relative entropy in terms of 
the minimal probability in its second argument are derived. Both the lower and upper bounds 
presented are reformulated for the case of Renyi's entropies. The Fano inequality is extended to 
Tsallis' entropies for all a > 0. The deduced bounds on the Tsallis conditional entropy are used for 
obtaining inequalities of Fannes type. 
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. I. INTRODUCTION 

oo : 

Physical systems involving long-range interactions, long-time memories, or fractal structures can hardly be treated 
within the traditional background of statistical physics. The Tsallis entropies have been widely adopted in this 
' (— i ', direction As a rule, stationary states of such systems are described by one-parametric extensions of the Zipf- 
O 1 Mandelbrot power-law distribution. Generalized entropies have also found use as alternate measures of an informa- 
tional content. For instance , th e entropic uncertainty principle has been expressed in terms of both the Renyi [43| 
and the Tsallis entropies [23|, |32( ■ Studies of generalized entropies allow to fit some properties of the standard entropy. 
The connection between strong subadditivity of the von Neumann entropy and the Wigner-Yanase-Dyson conjecture 
£h . is a remarkable example (see (Hi, EH and references therein) . 

The relative entropy, or Kullback-Leibler divergence [21| , is frequently used as a measure of statistical distinguisha- 
bility. Csiszar's /-divergence HQ and Petz's quasi-entropies [25|,[27j are famous generalizations of the Kullback-Leibler 
measure to the classical and quantum cases, respectively. In both the classical and quantum regimes, properties of 
, the relative entropy are the subject of active research. So, the development of a standard background to generalized 
■ entropies consists an important issue. In the present paper, we examine continuity properties of the Tsallis relative 
7-H | entropy. The obtained bounds are expressed in terms of the trace distance between two probability distributions or 
density operators. In this regard, our bounds characterize a continuity property in the sense of Fannes [2j]. 
1 The paper is organized as follows. In Section|TTl the main definitions are given. One consequence of the monotonicity 
of the quantum /-divergence is considered in Section IIIII A family of lower continuity bounds on the Tsallis relative 
7— I ' entropy of order a £ (0; 1) is derived in Section ITVl In their essence, these inequalities are one-parametric extensions 
of the Pinsker inequality. The case of Renyi relative entropy is considered as well. In Section [Vj upper continuity 
bounds on the Tsallis relative a-entropy of two probability distributions in terms of the minimal probability in its 
second argument are obtained. Fano type upper bounds on the conditional Tsallis entropy are derived for all a > 
in Section I VII As is shown, these bounds lead to generalized inequalities of Fannes type. 
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II. DEFINITIONS AND NOTATION 



In the classical regime, we will consider probability distributions over the finite index set of cardinality N. The 
trace distance between probability distributions P = {p(x)} and Q = {q(x)} are then defined as 

D(P,Q):=l ^ en b(z)-<7(z)| • (2-1) 

Let C{T-L) be the space of linear operators on finite-dimensional Hilbert space T~L. We also use the notations C+(H) to 
denote the positive semidefinite operators. For any operator X, we put |X| £ as a unique positive square root 

of X*X > 0. The trace norm ||X||i := tr|X| and the trace distance, defined as 

D(X, Y) := i ||X - Y|| ! = i tr|X - Yj , (2.2) 

are widely used in both the mathematical physics and quantum information theory. Using the Ky Fan norms, the 
partitioned versions of the above measures can be adopted properly [29|, |3l|. By ker(X) and supp(X) we denote the 
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kernel and the support of operator X. Eigenvalues of the operator X form the multi-set spec(X). For two operators X 
and Yon^, we define the Hilbert-Schmidt inner product by 

(X,Y) hs :=tr(X*Y) . (2.3) 

For positive a^l, the Tsallis a-entropy of probability distribution P = {p(x)} is defined by [38| 

H a (P) := - J- (V p(xT ~ l) = MxTl^pix) , (2.4) 

where ln Q z = — l)/(l — a) is the a-logarithm. The maximal value ln Q TV is reached for the uniform distribution, 

when p(x) = 1/N for all x € fl. The Shannon entropy H\{P) = — Y^ X P( X ) hip(x) is obtained in the limit a — > 1. By 
h a (u) we denote the binary Tsallis a-entropy, that is 

ft a (it) := ff Q ({u, 1 - u}) = -u Q ln Q u- (1 -u) a ln Q (l - u) (u G [0; 1]) . (2.5) 

This function is concave, since its second derivative is negative. It is clear that h a (u) = h a (l — u). Another important 
one-parametric generalization is the Renyi entropy (sec, e.g., || [26[ and references therein) 

R a (P):=^— m (V cn J>W a ) • (2-6) 
1 — a V/ — 'xen ) 

The entropies (|2.4p and (12.61) are connected by the equality (1 — a)R a (P) = ln[l + (1 — a)ff a (P)l . The quantum 
analogs of these entropies are respectively defined as 

R a (p) := (tr(p a ) - l) , (2.7) 

MP) == T~ — ln[tr(p Q )] . (2.8) 
1 — a 

Subadditivity of the quantum Tsallis entropy (12.71) for a > 1 has been conjectured by Raggio [28| and later proved by 
Audenaert This result has been extended to some of so-called unified entropies [33j . The subadditivity property 
was generally believed to be true for the Wigner-Yanase entropy, until counterexamples were given |17l.l37|. Meantime, 
if the bipartite state is pure then it is sufficient for the subadditivity. Other sufficient conditions for subadditivity of 
the Wigner-Yanase entropy are obtained in Q. 

The standard relative entropy of P = {p(x)} to Q — {q{x)} is defined as Hi (P\\Q) — — J2 x p( x ) ^ n [<l( x )/p( x )] @1- 
For density operators p and cr, the quantum relative entropy is expressed as [26| 

Hi (p|lo-) :=tr(plnp- phi a) . (2.9) 

In the classical regime, the Tsallis relative a-entropy is introduced by Q 

H a (P\\Q):=-J2 (ZO P( x )^ S t\ = T^— (i-E C „P(«) Q ?(») 1_Q ) • ( 2 - 10 ) 

* — 'x&Q P{x) 1 — a V * — <x£Q I 

Basic properties of this measure are discussed in d, [l4| . The Renyi relative entropy is defined as @ 

R a (P\\Q) := — — m(V p{xYq{xy- a ) . (2.11) 
1 — a V/ — 'ies! / 

It is convenient to extend the definition (|2.10[) to any positive- valued functions A and B on the finite set f2. For given 
set A = {a(x)}, we put the index subset CI a = {x : a(x) 7^ 0} and its complement CIa- For a > 1, the "Tsallis relative 
a-entropy" of A — {a(x)} to B = {b(x)} is defined as 



{q-1 (Sxe 
+00 , 



/f a (A||.B) := i ^T\^xen A a ( x ) ab ( x ) 1 a ~ J2 x en A a ( x )j > H B C Oa , ^.12) 



otherwise 

Omitting the second entry, we obtain the definition for < a < 1. For any positive scalar A, we have 

H a (XA\\XB) = \H a (A\\B) , (2.13) 



3 



i.e. it is a homogeneous function of degree one. For a € (0; 1) and density operators p and a, we define the Tsallis 
relative entropy as 

Ha(p|k) := (l - tr (//V-")) • (2.14) 

For a > 1, the right-hand side of (|2.14j) is well-defined whenever ker(er) S ker(p). In the singular case, when the 
term ker(cr) n supp(p) ^ occurs, the right-hand side of (|2.14p is dealt similar to the standard relative entropy (|2.9p . 
Namely, relative entropies arc defined to be +oo. Extending (|2.11[) to the quantum case, we define 

R a (p\\a) := — — lnftr^V 1 -")] . (2.15) 
1 — a 

For these entropies, we have the equality 

(a-l)R a (p\W) =ln[l + (a-l)H a (p||a)] , (2.16) 
and the same relation in classical setting. For a > 1 and A, B 6 £ + (H), we also introduce 

H Q (A||B) := { «-i - tr(A)) , ker(B) C ker(A) , ^ 
I +oo , otherwise . 



III. A CONSEQUENCE OF MONOTONICITY OF THE /-DIVERGENCE 

In this section, we prove one result which will be used for obtaining quantum bounds of Pinsker type. The Tsallis 
relative entropy (|2.10|) is closely related to the Csiszar /-divergence j8|. Let z *-> f(z) be a convex function on 
z 6 [0; +oo) with /(l) = 0. The Csiszar /-divergence of P = {p(x)} from Q = {q(x)} is defined as [1, Q 



Taking f a (z) = (a — l)~ 1 z a with positive a ^ 1 and adding corresponding constant, the formula p. II) leads to the 
right-hand side of (|2.10l) . The definition (|3.1[) can generally be used without the normalization condition. 

In the following, we use the convention that powers of a positive semidefinite operator are only taken on its support. 
Namely, by A -1 and A we respectively denote the generalized inverse of A and the projection onto its support. A 
quantum counterpart of Csiszar's /-divergence is introduced as follows [18(. For an operator A S £+(%), let and 
Ta denote the left and the right multiplications by A, respectively, defined as 

A A : X ^ AX , T A : X ^ XA , Xe C{H) . (3.2) 

Left and right multiplications commute with each other, namely A/s,Yb = TbA& for A, Be C+(H). Let z i-> f(z) be a 
continuous function on z G [0; +oo). Taking the set |a6 _1 : a £ spec(A), b G spec(B)}, we write [l8[ 

/(tIaTb-O^ J2 E /( a6_1 )^ T Q,> ( 3 - 3 ) 

aGspcc(A) b^spcc(B) 

where the formulas A = ^ a a P a and B = J^b^Qb express the spectral decompositions of A and B, respectively. If 
ker(B) C ker(A), then the /-divergence of A with respect to B is defined as [18[ 

S / (A||B):^(B 1 /2 ; /(yl A T B - 1 )B 1 / 2 ) hs . (3.4) 
Let 1 be the identity operator. In general case, the quantum /-divergence is defined by the formula 

S / (A||B):=limS / (A||B + £ l). (3.5) 

e\,0 

Basic properties of the quantity (|3.4[) are discussed in the paper [18| . Using the function f a (z) = (a — l) _1 z Q , for 
ker(B) C ker(A) we obtain 

S a (A||B) = -J— (B 1 / 2 , (tIaTb-O" B 1 / 2 )^ = -J— t^B 1 -") . (3.6) 
a — 1 ns a — 1 
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Adding the term (1 — a) _1 tr(A) in the right-hand side of (|3.6p . we have H Q (A||B) in view of (|2.17p . One of the 
most important properties of relative entropies is their monotonicity under the action of trace-preserving completely 
ositive (TPCP) maps 39]. For a general discussion of a role of stochastic maps in quantum theory, see the paper 
. Many fundamental results of quantum information theory are closely related to the monotonicity of the standard 
relative entropy [20l . |24| . |40| . General conditions for the monotonicity of the quantum /-divergence are obtained in 
[l8j . If the map <& is TPCP-map and the function / is operator convex on [0; +oo) then 

S f ($(A)\\$(B)) < S / (A||B) . (3.7) 

Note that the inequality (|3.7[) has generally been established in [l8| under weaker conditions on the maps. From the 
monotonicity (|3. 71) we can derive a simple upper bounds on the quantum /-divergence in terms of classical one. 

Theorem III.l. Let A, Be £ + ("H), and let V\± be projectors on the eigenspaces corresponding to positive and negative 
eigenvalues of the difference (A — B). There holds 

S f (A\\B)>S f ({u' ± }\\{v' ± }) , (3.8) 

where u'± = tr(l~l-|-A) and Wj_ = tr(l~l±B). 

Proof. Using the Jordan decomposition of Hermitian operator 

A-B = £ u|/i)</i|-£ (3-9) 

we define projectors n + = X^I^K/^I an d l~l = J2v When the difference (A — B) has zero eigenvalues, 

corresponding eigenvectors should be included to the orthonormal sets and {\v)} anyhow; then 11 + + = 1. 

Consider the TPCP-map & defined as 

A^(A)=£ |a*)MA|ai)M + X; \v){v\A\v){v\=J2 ""HM, (3-io) 

where probabilities — (/i|A|/i) and u v — (i^|A|z/). Putting — (fi\B\fi) and v u — (z/|B|z/), we further write 

#(B)=£ «»<mI+E ""l")^! • ( 3 - u ) 

So the outputs ^(A) and ^(B) are diagonal in the same basis. Combining this fact with the inequality (|3 . T[) . we 
obtain 

S/(A||B)>S/(#(A)||#(B)) = S/(K,M||K,M) ■ (3-12) 
The final step is to use the monotonicity in classical regime. Let us put the 2-by-d transition probability matrix 

t- (i:::i !:::!). 

in which the units of the first row act on //-components, the units of the second row act on ^-components. This matrix 
maps the ordered sets {u^,u u } and {v^,v v } to {u' + ,u'_} and {v' + ,v'S\ 1 respectively. It is clear that 

u'+ = ^2 u M = tr(n+A) , u'_ = ^2 u " = tr(n_A) , (3.14) 

and analogously for v± with B instead of A. By the monotonicity, the right-hand side of p.I2j) is larger than or equal 
to the right-hand side of (I3.8[) . ■ 

The function z i— > z a is operator concave on £ + (Ti) for < a < 1 (see, e.g., theorem 4.2.3 in 0). So the function 
fa{z) = {a — l)~ 1 z a is operator convex for < a < 1. Combining this with the inequality (|3.8[) . we then obtain 

S Q (A||B) > S a {{u' ± }\\{v' ± }) . (3.15) 

For density operators, we have u' ± = tr(n±p) and v' ± = tr(l~l±cr). Up to a notation, the result (|3.I5p with density 
operators was presented in [3(| (see theorem IV. 1 therein). Writing the relation 

\\A-B\\ 1 = \u' + -v' + \ + \u'_-v'_\ , (3-16) 

we should estimate the right-hand of (I3.8[) from below in terms of the distance ||A — B||i. 
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IV. PINSKER TYPE INEQUALITIES FOR a £ (0; 1) 

Studies of distinguishability measures and relations between them consist an actual issue of quantum information 
theory. The Pinsker inequality Q and its quantum analog expressed as [l9j 

R 1 (p\\a)>2B(p,a) 2 , (4.1) 



are well-known results of such a kind. Various lower and upper bounds on the relative entropy (|2.9p were obtained 
in Ref. 0. As given in terms of difference distances, these bounds characterize a continuity property in the sense 
of Fannes j2j. Recall that Fannes' inequality bounds from above a potential change of the von Neumann entropy 
in terms of trace norm distance (icj |. Fannes' inequality has been extended to the Tsallis q-entropy [HI HH and its 
partial sums [29l |. The authors of the paper [3(| proved the inequalities 

Hc(p||(7) <Hi( P \W) <Hp(p\\<r) , (4.2) 

where < a < 1 and 1 < (3 < 2. So, for 1 < j3 < 2 the relative entropy H^g (p| | cr) is bounded from below by the 
right-hand side of Eq. (14. ip . More detailed lower bounds on the relative entropy (12.91) are presented in Ref. [2j. By 
(|4.2p . these lower bounds are all valid for H^(/9||cr) with 1 < /3 < 2. 

Let n + be a projector on the eigenspace corresponding to positive eigenvalues of the difference (p — a). For 
normalized density operators, the inequality (I3.15[) together with the definitions (|2.10l) and (|2.14[) leads to the bound 

H a (p||a) >H a ({u,l-u}\\{v,l-v}) , (4.3) 

where we write u = tr(n+£) and v = tr(n + cr) for brevity. Denoting t = \u — v\, we also have \\p — o~\\i = 2t and 
D(p, cr) = t. In the paper [3(1 , for u, v € [0; 1] we have proved the inequality (see lemma IV. 2 therein) 

+ y/(i- u )(l-v) < Vl-t 2 , (4.4) 



whence H 1 / 2 ({u, 1 — 1 — w}) > 2 (l — \/l — and, therefore, H 1 / 2 (/9||cr) > D(p,cr) 2 . We shall now estimate 

from below the Tsallis relative entropy (I2.17P for arbitrary a £ (0; 1). 



Lemma IV. 1. Let u,v £ [0; 1] and g(t) = 1 — \/l — t 2 . For a £ [0; 1/2] and t = \u — v\, there holds 

u V-« + (1 - u) a (l - v) 1 - < 1 - 2ag(t) . (4.5) 

Proof. For fixed u and i>, we define the function 

$„„(«) = + (1 - u ) a (l - v) 1 -* + 2ag(t) - 1 . (4.6) 

The claim (|4.5p is equivalent to the inequality 3> ut ,(a) < for all a £ [0; 1/2]. First, we have &uv(0) = obviously; 

second, & uv (l/2) < in view of the relation (|4.4p . Third, $ ut ,(a) is a convex function of the parameter a. Indeed, 
for u, v ^ 0, 1 we write down 



9a 2 



(\n^y + (l-u) a (l-v) 1 - a (ini— ^ >0. (4.7) 



If a convex function is negative at the end points of some interval, it is negative in this interval everywhere. I 

Combining this with the statement of Lcmma llV. 1 1 gives a lower continuity estimate on the Tsallis relative a-entropy 
for a £ (0; 1). We formulate for two positive operators with equal traces. 



Theorem IV.2. Let A, B £ C+{W), tr(A) = tr(B) = 9, D(A, B) = r and g(t) = 1 - \/l-i 2 . For a/; a e (0; 1), there 
holds 

H Q (A||B) > x a 9g( T /9) , (4.8) 

where the factor x a = 2a{\ — a) -1 /or < a < 1/2 and K a — 2 for 1/2 < a < 1. 

Proof. Using the theorem precondition and (|3.16p . we have u' + + u'_ = v' + + v'_ =6 and ||A — B||i = 2(u' + — v' + ), 
whence r = u' + — v' + . It then follows from (|3.15p . that 

(l-a)S a (A||B) > (l-a)S a ({u' ± }\\{v' ± }) =-9[u a v 1 - a + (l-u) a (l-v) 1 - a ] , (4.9) 
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where u — u' + /8. v = v' + /8, and u — v = t/8. Since the map f|3 . 10[) is trace-preserving, tr(<?(A)) = 8. Combining this 
with (j2~TTl) and leads to 

{l-a)U a (A\\B)>8[l-u a v 1 - a -(l-u) a (l~v) 1 - a ] , (4.10) 

Due to the inequality (|4.5p . for < a < 1/2 the right-hand side of (|4.10p is not less than 2a8 g{r/8). Hence the claim 
(14.8[) with x a = 2a(l — a) -1 follows. For 1/2 < a < 1, we put f3 = 1 — a and also write 

0Hc(p||<7) > [l - u^V - (1 - m) 1 -^(1 - v)* 3 ] > 2/39g(T/d) . (4.11) 

Hence the claim (|4. 8[) with xr Q = 2 follows. ■ 

For probability distributions, the lower continuity bound (|4.8[) is rewritten with the classical trace distance r = 
D(P,Q). Expanding the function g(r/8) into power series, we obtain a family of lower bounds of the Pinsker type. 
Due to the binomial theorem, we actually have 

H Q (A||B)>x Q W / (-1)" +1 ^T. (4.12) 

n=l V / 

For all n, the coefficient ( 1 ^ 2 ) (— l) ,l+1 is positive. So, any partial sum of the series (|4. 1 2[) provides a lower continuity 
bound. In particular, for normalized density operators we have a quadratic bound 

R a (p\\a)>^-V(p,a) 2 . (4.13) 

For 1/2 < a < 1, the multiplier x a /2 = 1, whereas for a = lwe have the quadratic bound (|4.1|) with the multiplier 
two. This is evidence for that the series (|4.12[) does not provide an expansion with the best constants at powers of 
the trace distance. For the standard relative entropy H\{P\\Q), these constants have been the subject of long-time 



research (for details, see [12j and references therein). It would be interesting to develop methods for finding the best 
constants in Pinsker type inequalities for the Tsallis relative a-entropy. For instance, we could try to extend those 
ways that are known for the case a — 1. Nevertheless, the results (|4.8p and (|4.12l) resolve the question in a general 
sense, since they provide non-trivial lower continuity bounds on the Tsallis relative a-entropy. For a € (0; 1), we also 
combine (|2.16|) with (|4.8|) and hence obtain 

Ra(Hk) > — —r ln[l - (1 - a)x a g{r)] > x a g(r) , (4.14) 
a — 1 

where r = D(p, a). Indeed, the function (a — 1) _1 ln[l — (1 — a) x] increases with x S [0; jzr£) and — ln(l — £) > £ 
for £ 6 [0; 1). The inequality (|4.14l) can be regarded as the bound of Pinsker type on the Renyi relative entropy for 
a 6 (0; 1). In particular, we have a version of (|4.13[) with R^pHc) instead of H Q (p||cr). Thus, we have obtained a 
family of lower continuity bounds in terms of the trace distance on both the relative entropies (|2 . 14[) and (|2.15l) . 

V. UPPER CONTINUITY BOUNDS FOR a > 1 

One of basic features of the standard relative entropy is its unboundedness. The relative a-entropy enjoys the 
same for a > 1. So we may ask a behaviour of the functional H a (P\\Q) as the minimal probability in Q goes to 
zero. Of course, in the quantum case this question is more difficult due to the non-commutativity. For the standard 
relative entropy, such an upper bound was presented in [f|, and more bounds were given in (2j. For the quantum 
relative a-entropy of order a > 1, upper continuity bounds in terms of the minimal eigenvalue of its second entry 



were obtained in [34] . It turns out that in the commutative case these bounds can be sharpened significantly. Our 



derivation will mainly based on the joint convexity. Namely, for each positive a ^ 1 the quantity (|2 . 12[) enjoys 

H a (dA^ + (1 - 8)A^ || (9B« + (1-6>)B {2) ) < 8H a (A^\\B^) + (1 - 8)H a {A^\\B^) (5.1) 

for all < 8 < 1. This relation can be proved by means of so-called "generalized log-sum inequality" (see formula 
(16) in Q). The properties l|2.13p and (|5.ip lead to the following statement. 

Lemma V.l. Let A, B, C be three sets of positive numbers, and A, B, and C three positive operators. There holds 

H a (A + C\\B + C) < H a (A\\B) (0 < a < oo) , (5.2) 

H a (A + C||B + C) <H Q (A||B) (0 < a < 2) . (5.3) 
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Proof. Using (j2 . 13[) and (|5.1I) . we merely write 

H a (A + C\\B + C) = 2H a ((A + C)/2 \ \ (B + C)/2) < H a (A\\B) + H a (C\\C) — H a (A\\B) (5.4) 

in view of obvious H a (C\\C) — 0. The quantum relative entropy (I2.17[) also enjoys both the homogeneity of degree one 
and the joint convexity, but the latter only for < a < 2 (see, e.g., the review [20|)- Rewriting the above arguments 
with the quantum relative a-entropy instead of the classical one, we have arrived at the claim (|5.3|) . I 

For the standard relative entropy (12.91) , the relation (|5.3p was proved in Q . The inequality (|5.2I) can be utilized for 
obtaining an upper bound on H a (P\\Q) in terms of the trace distance D(P, Q) and the minimal probability 

g := min{qj : j G il P } . (5.5) 

Here we apply that any sum in H a (P\\Q) is effectively restricted to the index subset Op. Defining the set A = P — Q 
with elements Sj — pj — qj , we put another set Q with positive elements 

qj := max{g , Sj} • (5.6) 

Writing Q = Q + (Q — Q) and using the property (|5 .2[) with C = Q — Q, we obtain 

H a {P\\Q)=H a (A + Q + (Q - Q) || Q + (Q - Q)) < H a (A + Q\\Q) . (5.7) 

Use of (|5.2p is correct here due to positivity of both A + Q and Q — Q: we clearly have Sj + maxjgo, — Sj} > and 
qj — max{qo, Qj — Pj} > 0. The maximization of the right-hand side of (|5.7j) is under the conditions Y]j Sj = and 
J2j \Sj\ = 2 D(P, Q). We separately consider the two cases, D(P, Q) < qo and q < D(P, Q) < 1 — go- 
Theorem V.2. Let go &e defined by \5.5\) . fig C flp and r = D(P,Q). For a > 1, the Tsallis relative a-entropy is 
bounded from above as 

H a (P\\Q)<-L-((q + T) a qt ) - a + (qo-Trq 1 - a -2q ) (r < q ) , (5.8) 

H a (P\\Q) < ^ ((g + r) a ql- a - (g + r)) (g < r < 1 - g ) . (5.9) 

Proof. In the case r < go, we have max(go;~ Sj) = go for all j. By Xi and (— yj) we respectively denote positive 
and negative elements of the set A = {Sj}. The conditions J2j Sj — and J2j \Sj\ = 2r are rewritten as 

where and CJ y are corresponding subsets of the set Op of cardinality n. The right-hand side of (|5.7p is represented 
as the function 

a— 1 \^ — ^ — 'j6w s y 

This function should be maximized under the conditions < < yj and (I5.10[) , which define a simplex. Recall 
that the global maximum of a convex function relative to a convex set is reached at one of the extreme points of that 
set [Hj]. Hence the maximum of F{xi,yj) is equal to the right-hand side of (|5.8[) and reached, when one of the Xi's 
and one of the y^-'s are equal to r and other are all zero. 

In the case go < r, we add negative elements (— Zk) such that go < Zk and Y^jeu Di + 12keui, Zk = T - Before 
maximization, we further simplify the right-hand side of (|5.7[) . Putting the sets A^ = {xi}, A y = {yj} and A z = {zk}, 
we write 

A = A x — A y — A z , Q = q {I x + Iy) + A z , A + Q = qo(I x +I y ) + A x -A y . (5.12) 

Here denote the indicator of the set lo x taking the value one for j G uj x and zero for j G" uj x . Note that if the 
set VLz does not intersect with both the VLa and fip then H a (A\\B + Z) = H a (A\\B). Using this fact twice and the 
inequality (|5.2p again with positive C = go/ y — A yi we rewrite the right-hand side of (|5.7[) in a form 

H a (q (I x + I y ) + A x - A y 1 1 q (I x + I y ) + A z ) = H a (q I x + A x + C \ \ q Q I x + A y + C) 

< H a (q I x + A x 1 1 g Ir + A y ) = H a (q I x + A x 1 1 q I x ) . (5.13) 
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The latter is expressed as G(xi) = (a — (2i e w + Xi) a q ~ a — (n x qo + r)), where n x is cardinality of the u x . 
Under the conditions < Xi and n = r, the maximum of G{xi) is equal to the right-hand side of (|5.9[) and 
reached, when one of the x^s is r and other are all zero. ■ 

The upper bounds (|5.8p and (|5.9[) have a behavior 5 a with respect to the minimal probability q . For the quantum 
relative entropy H Q (p||<r), upper continuity bounds with a similar dependence on the minimal eigenvalue of a were 



obtained in our previous paper 34j. The bounds (|5.8p and (|5.9I) are stronger, but their proof is quite restricted to the 



commutative case. The principal point is that positivity of diagonal elements of a matrix do not imply positivity of 
matrix itself (except for the case of diagonal matrices). So, the proof of Theorem IV. 21 is purely classical in character. 



On the other hand, weaker bounds in the paper 34] have been proved just for the quantum case. Note that the 



inequalities (15.81) and (15.9[) can be rewritten in terms of a-logarithm as 

H a (P\\Q) < -(q + r ) ln Q (-^—] - (q - r) ln Q ( (r < q ) , (5.14) 

H a {P\\Q)<-(q + T)\n a (^—] (b <T<l- qo ). (5.15) 

Using (|2.16p in the classical regime, we see that the bounds (|5 . 14[) and (|5.15[) remain valid with R a (P\\Q) instead of 
H a (P\\Q). This claim follows from the points that ln{l + (a — 1) x} increases with positive x and ln(l + £) < £ for 
£ > 0. The relations (|5 . 14[) and (|5.15[) are a-parametric extensions of the upper bounds obtained in Ref. |2[ for the 
standard relative entropy. In effect, the upper continuity bounds (|5.8[) and (|5.9I) could be used for classical systems 
which are well described within non-extensive thermostatistics. 



VI. NOTES ON FANO AND FANNES INEQUALITIES 



In this section, we will obtain upper bounds on the conditional Tsallis a-entropy for all a > 0. It is convenient to 
change the notation slightly as follows. Let X and Y be discrete random variables with probabilities {px(x)} and 
{py(y)}, each supported on the iV-point set tt. By pxy{x,v) and Px\y(x\i/) we respectively denote the joint and 
conditional probabilities. The joint a-entropy and the conditional a-cntropy are respectively defined as 

H a (X,Y):=-^—(j2 PxY(x,y) a ~l) , (6.1) 

1 — a — 'x,y J 

H a (X\Y) := V p Y (y) a H a (X\y) , (6.2) 

where H a (X\y) = (1 - a)- 1 (J2 x Px\v(x\y) a - l). Rewriting H a (X\y) = - J2 X Px\y {x\y) a hi Q Px\y{x\v), we further 
obtain 

H a (X\Y) = -J2 PxY(x,y) a ]n aPxlY (x\y) , (6.3) 

due to the relation py{v)Px \Y(x\y) = pxY(x,y). We will follow the original scheme of derivation (see the classical 
text [ll(, section 6.2). The probability of error is expressed as 

Pe = V Py{v) q(e\y) , q(e\y) = 1 - Px\y{v\v) = Y] , Px\y(x\v) ■ (6.4) 

Lemma VI. 1. For all a G (0;oo) ? there holds 

H a (X\Y) < J2 y PY(y) a h a (q(e\y)) +\n a (N - l)J2 y PY(y) a q(e\y) a . (6.5) 

Proof. Using the expression for p(e\y) and the definition (|2.5|) . we write 

H a (X\y) = -px\Y(y\y) a ^ a p x \ Y (y\y) - V] , Px\Y(x\y) a \n a p X \ Y (x\y) 

= h a (q(e\y)) + q(e\y) a ln Q q{e\y) - ^2 x ^ y Px\Y(x\y) a ]n a p X \ Y (x\y) ■ (6.6) 
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Due to q(e\y) = ^2 x j, y Px \y{ x \v) an d the properties of a- logarithm, the second and third terms in the right-hand side 
of (|6.6[) are combined as 

~E -l Px\Y(x\y) a \n a p x \ Y (x\y) + g(e|y) g(e|y) Q_1 ln a g(e|y) 

= Px\Y(x\y) a (hi a p x \Y(x\y) -Px\Y{x\y) 1 ~ a q{e\y) a ' 1 \n a q(e\y)^ 

= -T, x ^ y Px\Y(x\y) a ^Px^^+px^yf^l^-^-^j (6.7) 

, i Px\Y(x\y) a Px\y{x\v) . / , n Q1 cai -n 

= -<l(e y) > , — 7 | , a ln Q — ' ; ; < gey ln Q (N - 1 . 6.8 
^x=i v q{e\y) a q{e\y) 

Here we used the identities ln Q (l/£) = — £ Q_1 ln Q £ (right before (|6.7p ) and ln a (£z) = ln a £ + m a z (right before 
)). Substituting in and further in fl|l]), we obtain §51$. ■ 



Theorem VI. 2. Let random variables X and Y take values on the same finite set with cardinality N . For given 
value of the error probability P e , the conditional entropy H a (X\Y) is bounded from above as 

P a — rvP 

H a (X\Y)<^- e - + P e a \n a [N(N~l)] (0 < a < 1) , (6.9) 

1 — a 

H a {X\Y) < h a (P e ) + P e a ln a (N - 1) (l<a<oo). (6.10) 

Proof. For a € (0; 1), we use the expression h a (u) = (1 — a) _1 [u a + (1 — u) a — 1] < (1 — a) _1 (u Q — era), which 
follows from (|2.5p and the inequality 



l-(l-w) Q =/ a(l - t) a ~ x dt > / adt = au. (6.11) 
Jo Jo 

By these relations and £ y = Py{v) f?(e| j/) , the first sum in the right-hand side of (16.51) is no larger than 

7^ £ s Mi/) a [9(e|l/) a ^ I^(E,C~ aP e) , (6.12) 
in view of Pi'(y) Q 9(e|?/) > J2 y PY(y) q(e\y) = P e - Using the Holder inequality, we also obtain 

max {Ej =1 C : < ^ < !. £* =1 & = P "} = ^""-P" , (6-13) 

which is reached for £ v = P e /N. So the term (1 - a)" 1 (N 1 ~ a P e a - aP e ) is an upper bound for the right-hand side 
of (|6.12[) . Combining this with the product of ln a (iV — 1) and (16. 13)) finally gives (|6.9p . 

Using PY{y) a < Py(v) for a > 1 and Jensen's inequality for the concave function (|2.5p . we have 

52 y PY(y) a ha(q(e\y)) < J £ y PY(y)h a (q(e\y)) < h a (j2 y Pv(y) q(e\yj) = h a (P e ) . (6.14) 

For a > 1, there holds ^ y Py (y) a q{e\y) a < (j2 y PY(y) l(s\y)j — P e a (this can be rewritten as || . || Q < || . ||i in terms 

of the vector norms). By these two points, the inequality (|6.5p leads to (|6.10f. ■ 

For a > 1 , the inequality (|6.10l) with P e instead of P° was presented in [l3[ . So we obtain an improvement of the 
known result. The inequality (|6 .9[) for < a < 1 is a new bound. By construction, the bound (|6.9p is not sharp. 
Nevertheless, this inequality is sufficiently exact for small values of P e . In any event, both the bounds of Theorem 
IVI.2l show that P e —> implies H a (X\Y) —> 0. On the other hand, if H a (X\Y) is large then the probability of making 
an error in inference must be large as well. In this regard, the essence of our inequalities concurs with a typical use 
of the standard Fano inequality in terms of the Shannon entropies. 

Uniform continuity is an important property of the von Neumann entropy. The first result in this issue was given 
by Fannes 10]. The Tsallis entropy itself [iHSU and its partial sums also enjoy this property. Using the classical 
Fano inequality, Fannes' bound has been sharpened (see theorem 3.8 and its proof of Csiszar in (26[). We shall now 
show that the Fano type inequalities (|6.9p and (|6.10p lead to the Fannes inequality in terms of Tsallis entropies. Using 
properties of the a-logarithm, the joint entropy (|6.ip can be re-expressed as [l3T ] 

H a (X,Y) = H a (X) + H a (Y\X) = H a (Y) + H a (X\Y) . (6.15) 
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Hence, in view of H a (Y\X) > 0, the difference H a (X) — H a {Y) < H a (X\Y) is bounded from above by the right- 
hand side of (|6.9[) for a 6 (0; 1) and by the right-hand side of (|6. 10[) for a € (l;+oo). For given probability 
distributions {px(x)} and {py(y)}, the joint probability mass function pxy(x,y) can be built in such a way that 
P e = D(X,Y) = (1/2) Y2 X \px{x) — py(x)\, - this follows from the coupling inequality (see, e.g., the book [22j). 
Setting {px(x)} = spec(p) and {py(y)} — spec(cr), we then have D(X,Y) < D(p,a) (see, e.g., lemma 11.1 in 26]). 
Assuming N > 2, the right-hand side of (|6.9|) increases with P e for all < P e < 1, the right-hand side of (|6.10p 
increases with P e for all < P e < N / (N + 1). Replacing D(X, Y) with larger D(p, a), we get the following result. 

Theorem VI. 3. Let d be dimensionality of the Hilbert space and r — D(p, a); then 

|H a (p) -H a (g)| < T ~ aT +T a \n a \d(d- l)] (0<a<l, < r < 1) , (6.16) 

1 1 1 — a 

\E a (p)-E a (a)\<h a (T)+T a ln a (d-l) (l < a , < r < -^—^j . (6.17) 

The inequality (|6.17j) . when a > 1, is just the uniform estimate derived by a direct method in [42j. This inequality 
is the best bound known for the Tsallis entropies in the parameter range a > 1. In the limit a — > 1, the inequality 
(|6.17|) reproduces the statement of theorem 3.8 in [26j. For a e (0; 1), there exists another inequality 

|H a 09)-H a (<7)| < ~ 2T + (2r) a ln a d , (6.18) 

1 1 — a 

provided that \\p- a\\i = 2r < The bound (|67l8|) was actually proved in the paper [H[ for all a € [0; 2], but 

the bound (|6.17[) is better for a > 1. Comparing our bound (16. 16)) with (16.181) . we see the following. In general, the 
bound (|6.16l) is weaker but covers all acceptable values r € [0; 1] of the trace distance. The scope of (|6.18[) is restricted 
to the range < 2r < a 1 ^ 1 ^ 01 ^. In low dimensions, however, the bound (|6.16l) can be better than the bound (|6.18l) . 
Say, for d = 2 and a = 1/2 the bound (|6.18l) holds for < r < 1/8. In this range, the right-hand side of (|6 . 18[) is 
larger than the right-hand side of (|6.16[) . Moreover, for sufficiently small r the difference between the two bounds is 
up to 40 %. Thus, the bound (16.161) has some practical interest, at least in the primary qubit case. 
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